Scalability of the SAS/STAT HPGENSELECT High-Performance Analytical Procedure: A comparison with RevoScaleR

ثبت نشده
چکیده

Effectively implementing high-performance analytics software solutions in the insurance industry Executive Summary At the Strata Conference on October 25, 2012, the research and planning division of a large insurance corporation (hereafter " insurer ") presented various methods that they used to model 150 million observations of insurance data. A summary of their presentation is available at:  The GENMOD procedure in SAS/STAT ® software  Custom MapReduce code on a Hadoop cluster  Open-source R  Revolution R Enterprise using RevoScaleR The insurer reported that PROC GENMOD took more than five hours to fit a Poisson regression model, whereas Revolution Analytics used the RevoScaleR package to fit the model in 5.7 minutes. However, this is not an " apples to apples " comparison because RevoScaleR was run on a cluster of computers, whereas the GENMOD procedure executes only on a single server. A more informative comparison can be made by using the HPGENSELECT procedure, which had not yet been released at the time of the Strata comparison. Introduced in SAS/STAT 12.3 in June 2013, PROC HPGENSELECT runs in either single-machine mode (multiple threads on a single machine) or distributed mode (multiple threads on multiple machines). Distributed mode requires SAS ® High-Performance Statistics. Purpose This paper compares the performance of the HPGENSELECT procedure with results cited for the RevoScaleR package by using data that are similar to the insurer's data. The paper also demonstrates the scalability of the HPGENSELECT procedure by using two sizes of data sets and three different computing environments. Results On a small grid with two nodes, the HPGENSELECT procedure fits a Poisson regression model with 150 million observations in 159 seconds, which is less than half the time that RevoScaleR required on a somewhat larger grid. On a grid with 140 nodes, the HPGENSELECT procedure solves the problem in 22 seconds. The scalability of the HPGENSELECT procedure is demonstrated by increasing the size of the data set. For a data set that has the same variables and one billion observations, the procedure executes in less than one minute. These results, which are summarized graphically in Figure 1, show that the HPGENSELECT procedure provides a faster alternative.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SUGI 27: SAS(r) Meets Big Iron: High Performance Computing in SAS(r) Analytical Procedures

Version 9 targets the heavy-duty analytic procedures in SAS® for high performance computing enhancements. These enhancements encompass both algorithmic improvements and modifications to exploit multiprocessor hardware. This paper provides a survey of this development and the performance gains obtained in several procedures in SAS/STAT and Enterprise Miner. Some general scalability issues are ...

متن کامل

SUGI 27: Up and Out: Where We're Going with Scalability in SAS(r) Version 9

This paper gives an overview of the ways that SAS is addressing performance through scalability in SAS Version 9. Scalability features have been implemented in many areas of SAS Version 9 to allow your applications to scale up and scale out. These include: • Multi-Process (MP) CONNECT, • the Scalable Performance Data Engine (SPDE engine), • certain SAS/ACCESS engines, • several scalable SAS pro...

متن کامل

The RANDOM Statement and More: Moving On with PROC MCMC

The MCMC procedure, first released in SAS/STAT® 9.2, provides a flexible environment for fitting a wide range of Bayesian statistical models. Key enhancements in SAS/STAT 9.22 and 9.3 offer additional functionality and improved performance. The RANDOM statement provides a convenient way to specify linear and nonlinear random-effects models along with substantially improved performance. The MCMC...

متن کامل

Analyzing a Regression Model with a General Positive Definite Covariance Matrix with The SAS System

This article discusses and proposes a procedure for the analysis of the univariate linear regression model with known general positive definite covariance matrix with SAS/STAT software of the SAS System. Estimation of parameters, hypothesis testing, estimation under constraints and collinearity and influence diagnostics are reviewed. An example is given to illustrate the procedure.

متن کامل

Improving LoRaWAN Performance Using Reservation ALOHA

LoRaWAN is one of the new and updated standards for IoT applications. However, the expected high density of peripheral devices for each gateway, and the absence of an operative synchronization mechanism between the gateway and peripherals, all of which challenges the networks scalability. In this paper, we propose to normalize the communication of LoRaWAN networks using a Reservation-ALOHA (R-A...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014